Audio-visual synchronisation for speaker diarisation

نویسندگان

  • Giulia Garau
  • Alfred Dielmann
  • Hervé Bourlard
چکیده

The role of audio–visual speech synchrony for speaker diarisation is investigated on the multiparty meeting domain. We measured both mutual information and canonical correlation on different sets of audio and video features. As acoustic features we considered energy and MFCCs. As visual features we experimented both with motion intensity features, computed on the whole image, and Kanade Lucas Tomasi motion estimation. Thanks to KLT we decomposed the motion in its horizontal and vertical components. The vertical component was found to be more reliable for speech synchrony estimation. The mutual information between acoustic energy and KLT vertical motion of skin pixels, not only resulted in a 20% relative improvement over a MFCC only diarisation system, but also outperformed visual features such as motion intensities and head poses.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Overview of Automatic Speaker Diarisation Systems

Audio diarisation is the process of annotating an input audio channel with information that attributes (possibly overlapping) temporal regions of signal energy to their specific sources. These sources can include particular speakers, music, background noise sources and other signal source/channel characteristics. Diarisation can be used for helping speech recognition, facilitating the searching...

متن کامل

Talking head detection by likelihood-ratio test

Detecting accurately when a person whose face is visible in an audio-visual medium is the audible speaker is an enabling technology with a number of useful applications. These include fused audio/visual speaker recognition, AV (audio/visual) segmentation and diarization as well as AV synchronization. The likelihood-ratio test formulation and feature signal processing employed here allow the use...

متن کامل

Robust audio-visual speech synchrony detection by generalized bimodal linear prediction

We study the problem of detecting audio-visual synchrony in video segments containing a speaker in frontal head pose. The problem holds a number of important applications, for example speech source localization, speech activity detection, speaker diarization, speech source separation, and biometric spoofing detection. In particular, we build on earlier work, extending our previously proposed ti...

متن کامل

Towards Audio-Visual On-line Diarization Of Participants In Group Meetings

We propose a fully automated, unsupervised, and non-intrusive method of identifying the current speaker audio-visually in a group conversation. This is achieved without specialized hardware, user interaction, or prior assignment of microphones to participants. Speakers are identified acoustically using a novel on-line speaker diarization approach. The output is then used to find the correspondi...

متن کامل

Multimodal speaker diarization using oriented optical flow histograms

Speaker diarization is the task of partitioning an input stream into speaker homogeneous regions, or in other words, to determine ”who spoke when.” While approaches to this problem have traditionally relied entirely on the audio stream, the availability of accompanying video streams in recent diarization corpora has prompted the study of methods based on multimodal audio-visual features. In thi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010